Predictive approach of health indicators from the physical activity habits of active youth

The aim of this study is to analyse the relationship between sport modalities practiced, physical fitness, body composition, and healthy habits in an active young population, using a statistical model for prediction. A total of 2255 (1528 boys and 727 girls) children and adolescents aged 6–17 years old who were involved in extracurricular sports from rural areas of Spain participated. Physical fitness was assessed through validated field test and, body composition was determinated using bioelectrical impedance analysis. Adherence to the Mediterranean diet was assessed by KIDMED questionnaire. The general sport variable was significant in VO2max when comparing the invasion and combat modalities to the reference level (court/net). The sex and age variables revealed significant differences in all physical fitness and body composition parameters. Health parameters, such as hours of additional practice, adherence to the Mediterranean diet, and previous experience, showed significant differences. The study concludes that the sport modality variables of training, sex, age, and maturational period have an impact on body composition and fitness parameters in this population. Therefore, by focusing on factors associated with lower values in health indicators, we can prevent health problems during adulthood, such as cardiorespiratory deficits.


Procedures and measurements
Once the study protocol had been approved, each participant underwent an individual assessment before the training session for the sport modality they were enrolled in.The test lasted between 60 and 90 min and was conducted in groups of 12-14 participants.Body composition and physical fitness tests were administered by qualified personnel and according to the protocol established in the Active Health project.In addition, the physical fitness tests were administered in such a way that fatigue did not influence the performance of the subsequent assessments.Participants and their parents or legal guardians were informed of the research objectives and test characteristics before the study began.They were required to give their informed consent prior to their son's and/or daughter's participation in the testing.

Physical fitness
To assess the different parameters of physical fitness, an adapted version of the extended Assessing Levels of Physical Activity (ALPHA) health-related fitness battery for children and adolescents 21 was used.The ALPHA-Fitness battery, established by Ruiz et al. 21, is structured around three core components: aerobic capacity, musculoskeletal strength, and body composition.Researchers conducted all the fitness tests, and they were performed in the following sequence: To evaluate upper-body muscular strength, a handgrip strength test with hand dynamometer with adjustable grip (Constant R Model: 14192-709E, China) was employed.Participants were instructed to maintain a full-extension elbow position and exert a continuous maximum force with their hands for a duration of 3 s.The test was performed alternately with the dominant and non-dominant hand, and participants were allowed a 30-s rest period between attempts.The score for each individual's best performance with his or her dominant hand was recorded to the nearest 1 g and expressed in kg as an absolute value 21 .The vertical jump test was used to determine the strength of the lower body muscles.During a countermovement jump (CMJ) with arm swing, the maximum vertical jump height was measured with an accuracy of 0.1 cm using photoelectric cells formed by two parallel bars (Optojump, Microgate, Bolzano, Italy).The CMJ arm swing test is a reliable and valid field test for determining muscular fitness 22 .This system measures the time of flight as the interval between take-off and landing.Prior to data collection, the CMJ test was explained to each participant.Participants were directed to execute jumps with maximal vertical height, involving a swift downward preparatory eccentric movement, allowing freedom of arm movement.Each participant successfully executed three jumps, and the highest jump among them was selected as the ultimate performance measurement.
Cardiorespiratory fitness was assessed by performing a maximum incremental field test (20-m Shuttle-Run Test).Participants were instructed to run back and forth between two lines spaced 20 m apart while maintaining a pace emitted by acoustic signals from a portable speakerphone.The test began with an initial speed of 8.5 km h −1 and was gradually increased by 0.5 km h −1 each min, according to the protocol established by Leger et al. 23 .The test was considered complete when the participant failed to reach the end of the lines in sync with the audio signals on two consecutive occasions.Alternatively, the test concluded when the participant stopped due to fatigue.The results were transformed in stages of 1-min duration, and the maximal oxygen uptake (VO 2 max) was estimated using the formula by Leger et al. 23 :    where the final speed is X1 (km h −1 ) and age is X2 (year as the lower rounded integer).The test was carried out just once, and it was done at the end to ensure that the participants' performance and fatigue would not affect the results.

Body composition
Body composition was indirectly measured using a portable segmental analyser of multifrequency bioimpedance analysis (BIA) (TANITA MC-780, Tanita Corp., Tokyo, Japan), which has been clinically verified to be accurate and reliable and to provide highly reproducible results 24 .Weight (kg), fat mass (kg and %), and muscle mass (kg and %) data obtained by BIA were used.Height (cm) was assessed with a height rod (Seca 214, Hamburg, Germany).Body mass index (BMI) was calculated with the weight (kg) divided by squared height (m).Moreover, the appendicular skeletal muscle mass (ASMM) was calculated by the sum of the muscle mass of four limbs.According to Salton, et al. 25 , the muscle-to-fat ratio (MFR) (MFR = ASMM [kg]/fat mass [kg]) was calculated.
Assessments were carried out with clothes on and barefoot.Finally, the handgrip strength-to-BMI (HBMI) ratio was estimated with the handgrip strength (kg) and BMI (kg/m 2 ).

Adherence to the Mediterranean diet
Adherence to the Mediterranean diet (MD) was determined using the KIDMED questionnaire.This test was created and validated in the enKid study 26,27 .This instrument was created with a focus on evaluating adherence to the MD among youth and adolescents.It has been used in recent systematic reviews 28,29 , providing a rich contextual framework for interpreting the outcomes.KIDMED consists of 16 items: 12 items represent a positive score for the adherence to the MD, and the remaining 4 items represent a negative score.A positive answer to a question that involves greater adherence to the diet is worth + 1 point (Q1-Q5, Q7-Q11, Q13, and Q15).A positive answer to a question that means less adherence to the diet is worth − 1 point (Q6, Q12, Q14, and Q16).
Negative answers do not score (a value of 0 is noted).The sum of all values from the administered test is considered as the KIDMED index and is categorised into three different levels: (1) low adherence (very low-quality diet, 0-3); (2) medium or moderate (improvement of the diet is needed, 4-7); and (3) high adherence (optimal adherence to the MD, 8-12) 27 .

Statistical analysis
Data analysis was conducted in R version 4.2.2 (2022-10-31 ucrt) with RStudio 2022.12.0.Participants were classified into three groups based on their age (6-9, 10-13, and 14-17 years old).This classification was checked with a linear discriminant analysis (LDA).The LDA aims to represent a dependent variable through a linear combination of other variables, thereby enabling a more precise classification of cases into specific age groups.This process is crucial for evaluating the model's ability to accurately classify cases by age and its generalization to new data.This analysis and its practical implementations have been already studied 30,31 .The predictor variables included in the LDA model were those that should be further analysed as dependent variables.Data partition 80-20% was performed in order to divide the sample into train and test data, respectively.Data were centred and scaled before modelling the discriminant function.Table 2 illustrates the accuracy of the LDA with 82.4% of cases correctly classified on the test sample and 81% in the training sample.Thus, three groups were accepted as a division method for the age variable.Data distribution was tested using the Kolmogorov-Smirnov test.The variables of weight, height, and BMI were non-normally distributed along the different groups.The choice of a Generalized Additive Model (GAM) for our analysis stems from its versatility in capturing nonlinear relationships among predictor variables and the response variable.Contrary to conventional linear models, GAMs incorporate smooth functions represented by B-splines with penalties, facilitating the modeling of complex nonlinear associations (Hastie & Tibshirani,  1990).This approach stands out as a robust tool for estimating such intricate relationships, particularly when the relationships resist simple predictor transformations or polynomial equations.Despite the interpretational challenges posed by the inclusion of smooth terms, GAMs offer significant advantages in flexibility and predictive performance compared to simpler linear models.Thus, the selection of GAMs for our analysis represents a well-considered choice based on their ability to effectively model nonlinear associations while acknowledging the broader landscape of available modeling techniques.The function gam(), from the package mgcv (version 1.8-41), was used for the model fit.The model was built with the predictor variables of general sport, skill sport, years of practice, KIDMED, sex, age, age groups, and the smoothed variables s(weight), s(height), and s(BMI).www.nature.com/scientificreports/These variables were used to run the model with each dependent variable (VO 2 max, fat mass, muscle mass, handgrip strength, vertical jump, MFR, HBMI).The number of nodes used for the different models was k = 20 as number in which k-index was higher than 1 (p > 0.05).The method for estimating the smoothed parameter was the restricted maximum likelihood.Age groups (6,10, and 14) were included in the smoothed terms as a factor to allow the interaction between them.The final model for each dependent variable was as follows: V 1 = general sport; V 2 = skill sport; V 3 = sex; V 4 = age; V 5 = age groups; V 6 = years of practice; V 7 = KIDMED; V 8 = hours per week; V 9 = weight; V 10 = height; V 11 = BMI; s = smooth.
Accuracy and error of the model were tested by different methods: adjusted R2, standard deviation of the original variable, model standard deviation, deviance explained, mean absolute error, mean absolute percentage error, root mean square error, index of agreement, and Akaike information criterion.

Results
Parametric coefficients of the GAM model are shown as estimate coefficient and standard error for each dependent variable according to the predictive variables and their levels (Table 3).Categorical variables with more than two levels are expressed as the estimate coefficient to the reference level.The reference levels for categorical levels are court/net (general sport), opened modality (skill sport), and 6-9 (age group).
Concerning VO 2 max (as shown in Table 3), significant differences were observed in the general sport variable, particularly when comparing invasion and combat modalities with the reference level (court/net wall).In addition, both sex and age variables demonstrated significant differences in all physical fitness and body composition parameters.The predictions of physical fitness and body composition based on mean values for each age, sex, and general sport classification are shown in Figs. 1 and 2, respectively.The results presented in Table 4 illustrate the performance metrics of various models across different physiological variables.R2 adj.values indicate the proportion of variance explained by the models, with values ranging from 0.42 for VO 2 max to 0.78 for fat mass, muscle mass, and handgrip strength.Standard deviations (SD) in the original and GAM-fitted data are noticeably reduced for most variables, suggesting improved model precision.Deviation percentages (Dev %) highlight the extent of improvement achieved by the GAM approach.Mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) provide insights into the accuracy of predictions, showing generally low error rates across variables.The index of agreement (IOA) indicates strong agreement between observed and predicted values.Overall, the application of GAM models demonstrates promising results in enhancing the accuracy and precision of predictions for physiological variables, as evidenced by the reduction in errors and improved agreement with observed data.

Discussion
The present study aimed to analyse the relationship between sport modalities practiced, physical fitness, body composition, and health habits in an active young population, using a statistical model for prediction.The main finding was that the sport modality significantly predicted VO 2 max.Additionally, variables such as sex, age group, hours per week, and years of sports practice were found to be the best predictors of physical fitness and body composition.For this reason, given that physical fitness is considered to be a useful marker of health in childhood and adolescence 32 , it is possible to estimate health-associated values through a practical battery of field tests in a young population.

Physical fitness
Regular sports practice from an early age influences the peak of VO 2 max 14 .This influence is attributed to the varying intensities and physiological stimuli inherent in each sport modality 14 .Our predictive model (Table 3) indicates that invasion sports had higher VO 2 max values compared to court or net sports.The scientific literature has shown a positive association between invasion sports and musculoskeletal, cardiovascular, and metabolic adaptations, primarily due to their high aerobic component 33,34 .Consequently, the physiological adaptations resulting from the demands of a specific sport can explain these differences.However, our results revealed higher cardiorespiratory fitness values in court/net athletes compared to those practicing combat sports.This discrepancy can be associated with the explosive physiological patterns characteristic of combat sports, which do not emphasise training based on aerobic capacity 14 .Regarding individual sports, no significant results were observed in the aerobic capacity test.These findings can be attributed to various factors, including the potential influence of growth, development, and genetics, which are more determinant than the impact of the sport discipline 35 .
On the other hand, all fitness variables had a significant influence when analysed by sex, showing a decrease in girls compared to boys.These disparities primarily stem from the pivotal role of sexual maturation, as boys exhibit higher testosterone levels than girls during maturing ages 36 .In addition, these differences can strongly affect the development of physical and sporting skills 37 .Similarly, age was also a significant predictor of physical fitness values.Our results showed a decrease in VO 2 max (Fig. 1).This agrees with the study by Patel 37 , in which younger subjects showed a higher resting heart rate and a higher maximal heart rate, which is directly associated with higher VO 2 max values.Additionally, this decline in VO 2 max may be influenced by increased adiposity relative to body weight 17 .The opposite effect is observed in the muscle strength results.As athletes get older, their values in the handgrip strength and vertical jump tests increase.Biological maturity is a key factor in variances in physical fitness 38 .
Table 3. Parametric coefficients of the GAM model for dependent variables and approximate significance of smooth terms.VO 2 max maximal oxygen uptake, MFR Muscle-fat-ratio, HBMI handgrip strength to body mass index, Estimate regression coefficients, Std Error Standard Error, F statistic; *Significant differences (p < 0.05); **Significant differences (p < 0.01), edf Effective degrees of freedom.The reference intercept of the general_sport group corresponds to the court/net modality, while the intercept of the skill_sport corresponds to the opened modality.
The intercept of the sex corresponds to the girls group, and the intercept of the age corresponds to the 6-9 group.Regarding the hour_per_week, years_of_practice, and kidmed groups, the intercept is set at 0.   www.nature.com/scientificreports/Hence, these changes in muscle strength tests are related to the hormonal changes in puberty 38 .This gain is more pronounced in males, who experience an increase in testosterone levels of almost three times 36 .This disparity may explain why more mature participants demonstrate better results in strength tests, especially males.
Previous research has demonstrated that regular participation in sports and PA is associated with improved physical fitness 16 .The type of sport, intensity, frequency, and longer training sessions are associated with positive values for cardiorespiratory fitness and muscular strength 39 .In this sense, Table 3 shows that the frequency of sports practice and previous sports experience significantly increases maximal oxygen consumption and performance in the vertical jump test.The levels of VO 2 max and vertical jump height improve with each additional hour of practice per week, as well as with each accumulated year of sports experience.Finally, the model predicted higher values for VO 2 max and HBMI ratio when athletes had higher adherence to the MD.The literature review demonstrated the importance of adherence to the MD in fitness parameters 3,40 .This can be attributed to the improvement in body composition and cardiorespiratory profile resulting from the physical activity in which the participants engage 41 .Therefore, early adherence to the MD is crucial to prevent an increased risk of cardiovascular disease, obesity, or metabolic syndrome in adulthood 42 .In conclusion, the combination of high adherence to the MD and an active sport practice seems to provide the highest protection against cardiometabolic risk 43 .

Body composition
Despite the research affirming that sport type influences and generates specific adaptations in anthropometric and physiological aspects in children and young people 14,44 , no significant differences were found in our results between sport type and body composition.This effect can be explained by not considering whether the subjects practiced different sports modalities, which disregarded the anthropometric adaptations produced by the practice of various sports.However, our results revealed differences in all body composition variables examined by sex.These findings are influenced by genetic and physiological distinctions between sexes 14 .After puberty, girls tend to increase in body fat mass, whereas boys show an increase in lean mass 45 , the latter associated with elevated testosterone levels 36 .This pattern would explain why boys have a greater ability to achieve higher levels of strength and cardiorespiratory fitness.Similar to the physical fitness parameters, age also behaves as a good predictor of muscle mass, fat mass, MFR, and HBMI.Athletes in the 10-13 and 14-17 years age groups showed  www.nature.com/scientificreports/higher values of muscle mass and lower values of body fat mass with respect to the 6-9 years group, especially in boys (Fig. 2).This trend in body composition may be attributed to their active participation in physical activities.Significant differences were found between regular sports practice and body composition.With each additional hour of sports practice, there is a decrease in the percentages of fat mass and MFR, and an increase in muscle mass.Ara, et al. 16 suggest that engaging in at least 3 h of sports per week in children is effective in reducing total and regional fat mass while increasing total lean mass.This research has certain limitations that need to be considered.The sample size in our study displays a notable sex imbalance, with significantly more boys (1528) than girls (727), potentially introducing bias into our findings.Furthermore, differences in sample size across various sports and age groups may have affected outcomes, considering factors like physical fitness, sportspecific physiological changes, anatomical characteristics, and the pubertal stage of each subject.Despite this imbalance, our sample size remains substantial and likely reflects societal norms regarding sports participation frequency, possibly justifying these disparities.It is crucial to note that our observational study design means sample selection is not controlled.This underscores the significance of acknowledging that secondary sports could lead to physiological and physical changes in individuals, emphasizing the necessity of incorporating these aspects into future research.
The results of the study conclude that the specific sport modality, frequency, intensity, and longer training affect body composition and fitness parameters in the child population.Our results reaffirm the existence of a relationship between regular participation in sports activities and anthropometric measurements, a powerful indicator of health.Therefore, this research provides further evidence for policy makers and researchers engaged in the promotion and development of active and healthy behaviours to take into account the importance of multilateral progress during the development of children.It is important to focus on sports and other factors that are associated with lower values in health indicators, such as cardiorespiratory fitness in combat sports.Strategies should be aimed at athletes who play these sports in order to achieve optimal specific values for children, taking into consideration factors such as sex, chronological age, and maturational period.In this way, possible health problems due to a cardiorespiratory-level deficit in adulthood could be avoided.

Figure 1 .
Figure 1.Predicted values and standard error of physical fitness variables for boys and girls.Predicted points are estimated based on the mean height and weight of each age and general sport classification.VO 2 max Maximal oxygen uptake, HBMI Handgrip strength-to-body mass index.

Figure 2 .
Figure 2. Predicted values and standard error of body composition variables for boys and girls.Predicted points are estimated based on the mean height and weight of each age and sport classification.MFR Muscle-fatratio.

Table 2 .
LDA accuracy and division method for the age variable.Marginal row results based on the percentage of success classification.Age group is refered to 6-9, 10-13, 14-17 years age group.

Table 4 .
Model accuracy, error and standard deviation.R2 adj Adjusted R2, Original SD Standard deviation of the original variable, GAM SD Model standard deviation, Dev (%) Deviance explained, MAE Mean Absolute Error, MAPE Mean Absolute Percentage Error, RMSE Root Mean Square Error, IOA Index of Agreement, AIC Akaike Information Criterion, VO 2 max maximal oxygen uptake, MFR muscle-fat-ratio, HBMI handgrip strength to body mass index.